Algorithm-Based Fault Tolerance in Linear Algebra Tasks
نویسنده
چکیده
The modification of weighted checksum method is proposed, which allows to derive the fault tolerant versions of most linear algebra algorithms. The purpose is detection and correction of calculation errors occurred due to transient hardware faults. Using the proposed method, the fault-tolerant version of Faddeeva algorithm is designed in this paper. The computational complexity of new algorithm is increased approximately on O(N) multiply-add operations in comparison with the original one. However, new algorithm enables to detect and to correct a single error in an arbitrary row or column of input data matrices at the each algorithm step. Finally, the results of experimental verification of the proposed algorithm are represented.
منابع مشابه
Improving the palbimm scheduling algorithm for fault tolerance in cloud computing
Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...
متن کاملAdaptive Algorithm-based Fault Tolerance for Parallel Computations in Linear Systems
This paper presents a dynamically adaptive stabilization scheme for parallel matrix computation. The scheme performs automatic error detection and correction through inserting redundant, but concurrent tracer computations within the folds of the regular computation. It also eliminates thecostly rowinterchangeused in classical pivoting. A fault-tolerant double wavefront matrix algorithm for a MI...
متن کاملOn-line soft error correction in matrix-matrix multiplication
Soft errors are one-time events that corrupt the state of a computing system but not its overall functionality. Soft errors normally do not interrupt the execution of the affected program, but the affected computation results cannot be trusted any more. A well known technique to correct soft errors in matrix–matrix multiplication is algorithm-based fault tolerance (ABFT). While ABFT achieves mu...
متن کاملAlgorithmic Techniques for Fault Detection for Sparse Linear Algebra
The growing complexity and variability of future computing systems is making it increasingly likely that individual circuits will produce erroneous results, especially when operated in a low energy modes. Previous techniques for Algorithm Based Fault Tolerance (ABFT) [7] have been proposed for detecting errors in dense linear operations, but have high overhead in the context of sparse problems....
متن کاملSoft Error Resilient QR Factorization for Hybrid System
As the general purpose graphics processing units (GPGPU) are increasingly deployed for scientific computing for its raw performance advantages compared to CPUs, the fault tolerance issue has started to become more of a concern than before when they were exclusively used for graphics applications. The pairing of GPUs with CPUs to form a hybrid computing systems for better flexibility and perform...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005